Improving Training Data using Error Analysis of Urdu Speech Recognition System
نویسنده
چکیده
Access to information is vital for development in today’s age. However there are several barriers to this for the average Pakistani citizen and also for the visually impaired community in Pakistan. However, literacy rate in Pakistan is very low. According to UNICEF, literacy rate in Pakistan was 60 percent [1]. This leaves about half the population unable to access information that is available in textual form. This problem can be solved by creating an interface between illiterate people and technology so that they can use these facilities. An interface can be created by using automatic speech recognition (ASR). To achieve this goal, speaker independent automatic, continuous and spontaneous speech recognition system and integration to new technologies is required. This approach will bypass the barriers e.g. literacy, language and connectivity that Pakistani citizens face to access the online content. Moreover screen readers are a form of technology useful to people who are blind, visually impaired or illiterate. This technology often works in combination with other technologies, such as speech recognition system, text-to-speech system. The current work has been done to investigate the issues in read and spontaneous speech recognition system developed in [3]. The word error rate of automatic speech recognition system that had been developed in [3] was 60%. The objective was to investigate the recognition issues. In this context, multiple experiments have been developed. Speech data has been cleaned by using error analysis techniques. Distribution of phonemes and their recognition results have been analyzed. Based on these results, possibility for developing minimally balanced corpus for speech recognition systems has been explored. Chapter 1Background and Introduction 1 Chapter 1Background and Introduction The task of Automatic speech recognition (ASR) engine is to convert the speech signal into textual form [2]. This engine can be integrated with many modern technologies to play a vital role in creating a bridge between the Pakistani illiterate communities and online information. This system can evenly be helpful to our blind community and to those who are literate but don’t have technical skills to operate information and communication technologies (ICT’s). This can also be a challenging task to students to communicate with the robots with their speech rather electrical signals. It can be integrated e.g. 1) with computer commonly known as Human Computer interface 2) with mobile technology to access the information from online sources. Through spoken dialog systems, a user can access the online information verbally over mobile channel. The information will be translated from any other language to the native language of the user and then converted in the form of speech. This technology will overcome all three barriers such as literacy, language and connectivity. It will serve as a simple and efficient information access interface. It can be equally beneficial for the visually impaired community. Spoken dialog systems have been developed in a number of different languages for different domain e.g. weather, travel information, flight scheduling and customer support etc. No such system exists in Urdu language so the design of the dialog a system that has been developed in other languages can be used as guideline. For example, Jupiter has been developed to provide weather forecast system for 500 cities over telephone channel. A user can access the weather information online available of several days. It also provides humidity, sunrise, precipitation, wind speed etc. The user can access this system by calling a toll-free number. Auto receptionist welcomes the user and indicates the free channel by a high tone. After that user can make any weather related query. When user stops making query, the will system play a low tone in order to indicate channel is busy. ‘*’ key can be pressed to interrupt the system. Chapter 1Background and Introduction 2 One of key component in spoken dialog systems is speech recognition engine. Speech recognizer in such systems plays the same role that mind has in human to human communication. A source-channel model is usually used to develop speech recognition systems. The listener’s mind decodes the source word sequence W that is delivered by other person. It passes through a noisy communication channel that consists of the speaker’s and speech information, also known as audio waveform. Finally, the human mind aims to decode the acoustic signal X into a word sequence ˆ W, which is the original word sequence W [16]. The signal processing module has been used to process the speech signal that extracts features for the decoder. It is used to remove the redundant information from speech signal. The decoder uses acoustic and language models to generate the word sequence for the input feature vectors [16]. Acoustic models represent the knowledge about phonetics, acoustics, environment and microphone variability and gender differences among speakers, etc. Language models represent a system’s knowledge of original possible word. Many challenging tasks exist in speech recognition problem such as speaker characteristics, background noise interference, grammatical variation, nonnative accents. A good speech recognition system must contend with all of these problems. The acoustic uncertainties of the different accents and speaking styles of individual speakers are compounded by the lexical complexity and represented in the language model [16]. Chapter2Introduction to Speech Recognition 3 Chapter 2-Introduction to Speech Recognition The ASR technology has been developed for many languages e.g. English, Japanese etc. It has also been developed for our local Urdu language but it’s recognition accuracy is not good as described in [3]. There is some kind of variables involved in Automatic speech recognition system that affects the performance. These variables should be restricted at some level to improve the performance of ASR engine e.g. 1) accent of speakers 2) vocabulary size 3) gender and age 4) background noise level 5) continuous versus isolated words [3]. One way is to limit the effect of these variables to make gender dependent recognition module. ASR engine can be categorized in small, medium and large vocabulary systems. Usually small vocabulary ASR system are known as digit recognition systems which based counting e.g. aik (one), do (two), teen (three) etc. having vocabulary size in range of tens where as medium and large vocabulary ASR engines consists of vocabulary size of connected words or complete sentences in range of above 20,000. These sentences again can be categorized in read and spontaneous speech. The recording environment is also a key factor that affects the performance. A good environment is an echoing chamber but system in such kind of environment will not work in noisy environment and cannot be used in daily life routine. One way is to record the real noise from working environment and superimpose on noise free recording as it is difficult to record the data from working environments. 2.1 Speech Recognition Architecture Speech recognition problem can be defined as [2] “Given some acoustic observation ‘O’, what is the most likely sentence out of all the sentences in the language?” In mathematical form it can be written as [2], ′ = ∈ ( | ) ------------(1.1) Where O set of individual observations and W is set of word: Chapter2Introduction to Speech Recognition 4 O= , , , ... ... ,
منابع مشابه
An ASR System for Spontaneous Urdu Speech
One of the major hurdles in the development of an Automatic Spontaneous Speech Recognition System is the unavailability of large amounts of transcribed spontaneous speech data for training the system. On the other hand transcribed read speech data is available comparatively easily. This paper explores the possibilities of training a spontaneous speech recognition system by using a mixture of re...
متن کاملLinear Discriminant Analysis Based Approach for Automatic Speech Recognition of Urdu Isolated Words
Urdu is amongst the five largest languages of the world and enjoys extreme importance by sharing its vocabulary with several other languages of the Indo-Pak. However, there has not been any significant research in the area of Automatic Speech Recognition of Urdu. This paper presents the statistical based classification technique to achieve the task of Automatic Speech Recognition of isolated wo...
متن کاملUrdu Dependency Parser: A Data-Driven approach
In this paper, we present what we believe to be the first data-driven dependency parser for Urdu. The parser was trained and tuned using MaltParser system, a system for data-driven dependency parsing. The Urdu dependency treebank (UDT) is used for training and testing of the Urdu dependency parser, is also presented first time. The UDT contains corpus of 2853 sentences which are annotated at mu...
متن کاملError Analysis of Single Speaker Urdu Speech Recognition System
Speaker independent, spontaneous and continuous speech recognition system (ASR) can be integrated to other technologies like mobile to create an interface between technology and illiterate people so that they can use modern technologies. One of the major hurdles in such ASR is unacceptable word error rate. The paper explores the possibility of analyzing the Urdu speech corpus based on recogniti...
متن کاملImprovements in RWTH LVCSR evaluation systems for Polish, Portuguese, English, urdu, and Arabic
In this work, Portuguese, Polish, English, Urdu, and Arabic automatic speech recognition evaluation systems developed by the RWTH Aachen University are presented. Our LVCSR systems focus on various domains like broadcast news, spontaneous speech, and podcasts. All these systems but Urdu are used for Euronews and Skynews evaluations as part of the EUBridge project. Our previously developed LVCSR...
متن کاملAutomatic Speech Recognition of Urdu Digits with Optimal Classification Approach
Speech Recognition for Urdu language is an interesting and less developed task. This is primarily due to the fact that linguistic resources such as rich corpus are not available for Urdu. Yet, few attempts have been made for developing Urdu speech recognition frameworks using the traditional approaches such as Hidden Markov Models and Neural Networks. In this work, we investigate the use of thr...
متن کامل